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Abstract 



We evaluate the mutual information between the input and the output of a two 
layer network in the case of a noisy and non-linear analogue channel. In the case where 
the non-linearity is small with respect to the variability in the noise, we derive an 
exact expression for the contribution to the mutual information given by the non- 
linear term in first order of perturbation theory. Finally we show how the calculation 
can be simplified by means of a diagrammatic expansion. Our results suggest that the 
use of perturbation theories applied to neural systems might give an insight on the 
contribution of non-linearities to the information transmission and in general to the 
neuronal dynamics. 
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1 Introduction 



The purpose of the work is to study the information properties of an analogue communication 
channel, constructed by a two-layer neural network, receiving data from a Gaussian source. 
This data is corrupted with Gaussian noise with a known variance and the output signals 
are affected by some random uncorrelated output noise. 

Contrarily to what happens in the case of a linear Gaussian channel, which can be easily 
solved even in presence of noise , f2j , the exact calculation in the case of analogue channel 
requires some assumptions on the relation between the non-linear term and the level of 
noise. In particular, we suppose a small non-linearity, compared to the output noise. This 
corresponds to the case where the sigmoidal transfer function is relatively flat and the channel 
is noisy. Under this assumption, the mutual information between the output and the input of 
the channel can be evaluated analytically. The perturbative approach by means of Feynman 
diagrams, ||, developed in this paper, allows to represent in a direct and elegant way the 
perturbative corrections in first order of perturbation theory for every kind of non-linearity. 

Comparing with the extreme case of the binary transfer function, where special mathe- 
matical techniques @] , || , || are introduced for the calculation of the mutual information, 
the present analysis deals mainly with the effect of the non-linearity on the mutual informa- 
tion and the rational way of investigating it. The problem of its maximization with respect 
to the coupling matrix [0] will be considered elsewhere ||. 

The paper is organized as follows: in Section 2 we introduce the model and in Section 3 
the mutual information is derived in the case of a general non-linear function. In Section 4 
we present the results for the typical case of cubic non-linearities. In Section 5 we develop 
the rules to express the perturbative series in terms of Feynman diagrams in the case of 
the same cubic non-linearity. In Section 6 we discuss the case of a general non-linearity. 
In Section 7 we present shortly the calculation of the mutual information in the case of a 
generic non-local cubic nonlinearity and explain how the diagram technique is modified. We 
conclude with some final remarks and with future developments of this work. 

2 The network model 

We consider a two layer network with N continuous inputs x={xi...xn} which are Gaussian 
distributed and correlated trough the matrix C: 

(Xi) = 0; (1) 
(x iXj ) = [C\ij, Vi, j G 1, 2, ..N. (2) 

The signals are corrupted by uncorrelated Gaussian input noise v={h / 1 ..u N }, with 

(fi) = 0; (3) 
{v iVj ) = b 6 l3 , Wi,j G 1, 2, ..N. (4) 

The output vector is a function of the noisy input x + i> transformed via the couplings 

{JijY- 
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V = G(J(x + v)) + z. (5) 

We also assume that the output signals are affected by some random uncorrelated output 
noise z—{z\..Zn}, with the following Gaussian distribution: 

(Zi) = 0; (6) 
(z iZj ) = b6 ij ,Vi,jel,2,..N. (7) 

The transfer function G(x) is a smooth continuous function, which typically has a sig- 



moidal shape in the case of analogue neuronal devices ||, [fLOd , pl| . One possible choice 
is: 

G(x) = th(Px), (8) 

where the parameter (3 modulates the steepness of the curve. A linear input-output 
relationships has already been considered in the context of the mutual information in previous 
works [|T^]. Here we examine the contribution to information transmission given by a small 
non-linear term in the channel transfer function. 

Assuming that the argument of the transfer function is small, a Taylor expansion of eq. (Q) 
gives: 

G(x) =th{(3x) — + o(x 4 ), (9) 

where the higher order terms are all odd powers of x. Thus the output of the channel 
can be written as: 

V ~h + g(h) + z, (10) 

where g(h) is a generic non-linear term. For example it could be the cubic term or a 
higher order term in the expansion of the th(f3x)) in terms of h=J(x + u). 

We are interested on the mutual information [13| between the input and the output 
signals: 

1 = J dx I iVP ^ v)l ^mm- 

It is easy to show that / can be written as the difference between the output entropy and 
the "equivocation" between the output and the input: 

I = H{V) - (H(V\x)) x , (12) 

where 



H(V) = -J dVP(V) log 2 P(V) (13) 

and 

(H(V\x)) x = - J dxj dVP(x)P(V \x) log 2 P(V\x). (14) 
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In the next section we present the calculation of the mutual information separately for 
the output entropy H(V) and for the equivocation (H(V\x)) x , in the considered case of a 
non- linear channel. 



3 The Mutual Information 

3.1 Evaluation of the output entropy 



Let us consider the probability for the output signals P(V) in eq. ([13]). If the non-linear 



term g(h), present in eq. QTDI) , were equal to zero, the evaluation of H(V) would be trivial, 



as V would be a linear combination of Gaussian variables. In order to extract explicitely the 
dependence of P(V) on the non-linear term g(h), we introduce the conditioned probability 
P(V\h): 



P(V) = J dhP(h)P(V\h); (15) 
P(y\h) = — r = • e-( v - h -e( h » 2 / 2b . (16) 

Expanding P(V) to the first order in g , assuming a small non-linearity, compared to 
the variance of the output noise, we obtain: 

P(V) ~ P (V)[1 + \{g{h) T {V - h)) h + 0(g)], (17) 

where 



P (V) = J dhP(h)P (V\h), (18) 
P Q{ V\ h ) = ^. c -<v-b)»/» (19) 
(F(h V)) - fdhP(h)P (V\h)F(h,V) 

" — j dhP(h)p (v\h) — (20) 

and we have assumed that higher order terms in the ratio g/b are negligible. 
Substituting eq. fll7|) in the expression for the output entropy H(V) we obtain at the 
first order in g: 

H{V) ~ H {V) -\jdhj dVP(h)P (V\h)g(h)(V - h) log 2 P {V). (21) 

Here H (V) is the output entropy in the case of a linear channel 0, @. 

We remind that Po(V) is the probability for the output V when g(h)=0. In this case V 
is a linear combination of zero mean Gaussian variables and its distribution is a Gaussian 
centered in with a covariance matrix given by: 

(ViVj) = [JCJ T + b JJ T + bljtj = [A + bl\ ih (22) 
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where we have set A = JCJ T + bo>JJ T . 

log 2 Po(^)Q can be explicitely written also in the following way: 

log 2 P (V) = -V T (A + bI)- x V 

= -[(V- hf[A + bI]-\V -h) + h T [A + bI]-\V - h) 

+(V - h) T [A + bl}- 1 ^ - h T [A + bl]- l h, (23) 

which now can be easily integrated over the Gaussian distributions P(h) and P(V\h) = 
P(V — h). Since g(h) is an odd power function of h like any term in the expansion of the 
transfer function (H) , only the second and the third term in the sum in eq. fl2~3|) give non zero 
contributions. Thus, the expression of the integral in the expression for the output entropy 
eq.(pTF) becomes: 



-> -Jd h JdV P(h)P (V\h)g(h) J \V ~-h) [h T [A + bI}-\V - h) + (V - h) T [A + bI\- x h T 

(24) 

The integration over V leads to the final expression for the output entropy in terms of a 
general non-linearity g(h): 

H(V)~H {V) + AH(V), (25) 

AH(V) = J dhP(h)g(h) T [A + bI]- l h. (26) 

The evaluation of the integral in dh requires a specific choice for the non-linearity. Be- 
fore introducing it, we show how to obtain a similar expression for the equivocation term 
(H(V\x)) x . 

3.2 Calculus of the equivocation term 

We remind the expression of the equivocation term: 

(H(V\x)) x = - j ' dxj dVP(x)P(V\x) log 2 P(V\x); (27) 

The evaluation of this term can be carried out in a very similar way to the output entropy. 
We use the equivalence: 

P(V\x) = J dhP(V\h)P(h\x). (28) 
Then, expanding P(V\h) in powers of g(h) jb up to the first order as in eq.flT^) we obtain: 

P(V\x) ~ F (V|aO[l + \{g{h) T {V - h)) h]x + o(g)}, (29) 



1 We implicitely absorb the constant l/log e 2 in the definition of log 2 whenever we change basis from log 2 
to log e . 
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where 



P (V\x) = J dhP(h\x)P (V\h), (30) 

P (V\H) = ^-e~^\ (31) 
,„, h vn fdhP(h\x)P (V\h)F(h,V) 

= j dhP(h\x)p (v\h) ■ (32) 

Substituting eq.(^) in the expression of the equivocation term, we obtain: 

(H(V\x)) x ~(H (V\x)) x - i JdhJdxJdVP(x)P(h\x)P Q (V\h)g(h)(V-h) log 2 P {V\x). 

(33) 

Here the conditional probability Pq(V\x) is: 

1 _ e -{V-Jx.) T [B+bI]^(\r-Jx.)/2^ (g 4 ) 



27rdet[B + bl] 



where B + bl is the correlation matrix between the outputs in absence of signals at g = 0. 
From eq. (|l0"D , (|3j) , (|7|) one can derive: 



{{V l -[Jx] l ){V 3 -[Jx] j )) = [B + bI] lJ 

B = b JJ T (35) 
The expression for the equivocation term becomes: 

(H(V\x)) x ~ (H (V\x)) x + A(f/"(V|aj))a;, (36) 

where 

A(H(V\x)) x = ^JdhJdxjdVP(x)P(h\x)P (V\h)g(h) T (V-h)(V-Jx) T [B + bI]-\V-Jx). 

(37) 

The integration over dV is carried easily as Po(V\h) is Gaussian and by using the re- 
placement: 

V - Jx - h + h- Jx. (38) 
The final expression to be integrated over V,h and x becomes: 

A(H(V\x)) x = ^JdhJdxJdVP(x)P(h\x)P (V\h)g(h) T (V - h) ■ 

[V - h) T [B + bl]-\h - Jx) + {h- Jx) T [B + bI]-\V - h)] . (39) 
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The integration over dV gives for the equivocation term: 



(H(V\x)) x ~ (H (V\x)) x + JdHj dxP(x)P(H)g(H + Jx) T [B + bl^H, (40) 

where we have changed variable from H — ► /i — Ja;. This expression is our final result 
for the equivocation term in the case of a general non-linear function g(h). 
Combining eqs.(^) and fl40|) , the mutual information reads: 

I = h+j dhP(h)g{h) T [A + bI\-^h - JdH jdxP(H)P(x)g(H + Jx) T [B + bl}- 1 H , (41) 
where Jo is the mutual information in absence of non-linearities 

4 Cubic non-linearity 

The final expression for the mutual information has been obtained in the case of a generic 
non-linearity g(h). To carry further on the calculation, we have to specify its shape. Let us 
consider the first non-linear term in the expansion of the sigmoidal transfer function (^j): 

flf(h) = -g h 3 , (42) 

where we have set /3 3 /3=g . By using the Wick theorem ||, and (hihj) = A+j, the 
integration over h in eq.(fnj) can be carried out quite easily and the final expression for the 
output entropy H(V) for this special choice of g is: 

H{V) = H {V) - 3g £ A U [A + bl\^A iy (43) 



'■■J 



The evaluation of the integrals over x and H in eq. (ETf) for the equivocation term can 



be carried out with the same procedure. As only even powers of both variables give non zero 
contribution, only the terms Hf + 3Hi[Jx]i[Jx]i in the expansion of g(tt + Jx) remain. The 
integration over x and H gives: 

(H(V\x)) x = (H (V\x)) x - 3g J2[B + bI^\B l3 B 3J + B %3 [JC J T } J3 }. (44) 

From eqs. ( |43D and fl44|) we derive the expression for the mutual information: 

I = I - 3g £ Au[A + bI}r l A l3 - [B + bI^B i3 \, (45) 

where we have used that A — B = JC J T from eq. (p2[) , (p5[) . 

An interesting issue to investigate is whether the contribution to the mutual information 
given by higher order non-linear terms in the transfer function (181) is positive or negative 



2 Notice that P(h) is different from P(H): both distributions are Gaussian, but with different variances, 
as H = h — Jx; (hihj) = Aij, while (HiHj) = By . Matrices A and B are given respectively in eq.(p2|) and 
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varying the level of noise and the strength of the correlations. Here we just mention the case 
where the synaptic connections are positive and the inputs units are independent. In this 
very simple case it is easy to see from equation (|45|) that the contribution to the mutual 
information is negative. This makes sense as the information carried by independent units 
is found to be additive; it is reasonable to think that a small negative non-linearity in the 
transfer function, which takes into account the saturation of the output to a given threshold, 
depresses the information. A more detailed study of the effect of a non-linearity with respect 
to an enhancement or depression of the mutual information will be the object of future 
investigations. 

In the limit of vanishing output noise, b — > 0, by using the 
fact that A and B are invertible matrices, we get: 

/ = J - Sgob^AuiB- 1 - A-% + 0(b 2 ). (46) 

i 

5 Diagrammatic approach for a cubic non-linearity 

It's well known from perturbation theory [[J that a series of Gaussian integrals can be ex- 
pressed as a diagrammatic expansion, which makes the evaluation of high order contributions 
faster and elegant. We show here how the evaluation of integrals (p4|) and (|39|) can be ex- 
pressed in terms of Feynman diagrams. Even if the formalism we develop is specific to our 
case, this is the first attempt to introduce a diagrammatic technique to take into account 
high order effects in information transmission in a progressive controlled way. 

We summarize here the definitions and the rules which allow to build the diagrams. The 
general formalism can be found in ||. 

To evaluate the output entropy in eq. (^|) we introduce the following components of the 
graphs and rules to connect them: 

1. Each term Vi — hi is represented by a wiggly line m-^r^r^^r^ 

2. Each term hi is represented by a solid line • 



3. Each matrix element [A + bl}^ 1 is represented by a dashed square 




4. The integration over hi,hj corresponds to the contraction of two solid lines coming out 
of vertices i,j, which produces the matrix element Aij • • 

5. The integration over Vi,Vj corresponds to the contraction of two wiggly line coming 
out of vertices i,j, which produces a term bSij 

Let us consider the case of the cubic non-linearity and let us set g(h) = —g h 3 . Following 
the rules listed above we can identify each factor in the integrand as a diagram: 




h!(V t - hi) 
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— 



(Vi-h^A + bljr/h 



The result of the integrations is expressed as a series of diagrams obtained connecting 
the lines in the first diagram with the lines in the second and in the third diagram in order 
to construct all the topologically distinct and connected diagrams: 



H(V) ~ H (V) 



go 

2b 



E 

ijk 




+ <^~~ 



(47) 



Each of the three solid lines coming out from the first diagram can be connected with the 
solid line coming out from the second diagram and similarly from the third diagram, while 
the remaining two solid lines are contracted in a loop; thus we have at the end 6 times the 
same diagram : 



(48) 




It's easy to check that applying the rules for the contractions of wiggly and solid lines 
one obtains the expression of the output entropy which coincides with eq . fl43|) . 

Now we introduce analogous graphic rules for the evaluation of the equivocation (|3T)|); 
some rules are the same as the ones listed, but we need a new element in the graph to 
represent the vector Jx. The full prescription is given below: 

1. Each term Vi — hi is represented by a wiggly line rv^^w 

2. Each term Hi is represented by a solid line • 

3. Each term [Jx]i is represented by a dashed line • 

4. Each matrix element [B + b]^ 1 is represented by an empty square 

o 

5. The integration over Hi,Hj corresponds to the contraction of two solid lines coming 
out of vertices i,j, which gives the matrix element B^ % % 

6. The integration over V(,Vj corresponds to the contraction of two wiggly lines coming 
out of vertices i,j, which gives the term b5ij 



7. The integration over XiXj corresponds to the contraction of two dashed lines coming 

1? • • 



out of vertices i,j, which gives the matrix element [JCJ]J 
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Let us consider the case of the cubic non-linearity. Moreover let us set Hi=hi — [Jx]i. 
Then 

goiHi + [Jx],) 3 ^Hf + 3Hi[Jx]i[Jx]i (49) 

as odd powers of H and x give zero contribution to the integral. 

As in the case of the output entropy we can identify the different factors multiplied in 
the integrand in eq.(|3"9D with different diagrams: 



Hf(Yi - hi) 
HiiJxUJxUVi-h) 
O > (Yt-hiXB + bltfHj 

— O > Hip + brtfM-hj) 

Thus the expression for the equivocation can be written in the following way: 
(H(V\x)) x ~ (^(Vla;)), 

J c 

Now we have to connect both the first and the second diagram to the third and to the 
fourth diagram in all possible ways to obtain fully connected diagrams. 

It's easy to see that the contraction of the first diagram with the third and the fourth 
ones gives 6 times the diagram already obtained in the case of the output entropy (fE|) . 

The contraction of the second diagram with the third and with the fourth diagrams gives 
a new contribution: 

Writing together the two contributions we obtain the expression for the equivocation, 
which is equal to eq. (0) as it was expected. 
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6 Diagrammatic expansion and mutual information in 
the case of higher order non-linearities 

We show here how to obtain the diagrammatic expansion and the final expression for the 
mutual information in the case of higher order non-linearities. This allows eventually to 
evaluate the contribution given by each term in the expansion of the transfer function (|8|). 
Let us consider a generic term goh^ n+1 0. 

The evaluation of the integrals (24) and (|39|) can be carried out in a very similar way. 
We make the following substitutions: 



h 



(Hi + [Jx] 



hr 



2n+l 



(Hi + [Jx] 



\2n+l 



f( 2n + l W n+i ^([^] v2 ' 



1=0 



(51) 
(52) 



Here the binomial expansion of (Hi + [Jx]i) 2n+1 contains only even powers of [Jx]i because 
odd powers give zero contribution when integrated over x. 

These changes correspond to analogous replacements in the basic diagrams: 





000000000 



(53) 



(54) 



The double solid line in the upper diagram on the rhs is a short notation for a set of 
2n + 1 solid lines. 

In the lower diagram on the rhs the double dotted line stands for a set of 21, 1 — 0, n 
single dotted lines and the double solid line represents a set of 2n + 1 — 21 single solid lines. 
Then the diagrammatic equation for the output entropy and for the equivocation in the case 
of a generic 2n + 1 th order non-linearity can be written as follows: 



H(V)^H (V) + fj2 

ijk 




+ 



(55) 



(H(V\x)) x 



(H (V\x)} 
9o 



E 

1=0 



2n + 1 
21 




o o o o o o o 



We always call go the constants depending on parameter /3 not to introduce too many parameters 
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Constructing all the topologically distinct diagrams, according to the rules given above, 
one can derive the final expression for the mutual information: 



/= J +(2n + l)(2n-l)# £^ + &]r.% 

ij 

2n+1 9<n _|_ 1 1 On 1 9/ 

+ 9»E( o} )(2« + 1 - 2Q-( ' n )-( ) + ^^rV^l^e) 

Eq. (|56| ) is the final expression in the case of a generic non-linear function of the type 
9i — 5 , o^i™ +1 ) f° r which the diagrammatic techniques provide an easy and direct way to 
calculate the mutual information. Since in the case of the sigmoidal function (§) the expansion 
includes only odd powers in hi, the derivation of the diagrammatic series for the whole 
Taylor expansion is straightforward, at least up to the first order in g/b. This shows how the 
diagrammatic technique provides a compact and easily readable expression for the mutual 
information in the case of a non-linear noisy analogue channel. 



7 Generalization to non local forms of cubic non-linearity 

Let us now investigate the case of a non local non-linearity which depends on the local fields 
of all outputs. This could correspond to the case where, for example, the global output of 
the network is constrained in such a way that the local outputs of the single units depend 
on the total structure of the connectivities. The general case of 2n + 1-order non-linearities 
is quite complex, but the analysis can be carried out quite easily in the case of a cubic non 
local non-linearity. The most general third order term can be written as: 



9i{h) 



X/ 9 Mr 

klm 



,hkhih r 



(57) 



Substituting eq. fl57D in eqs.(|24]) and (|39| ) it's easy to check that the output entropy H(V) 
and the equivocation (H(V\x)) x can be written as diagrammatic equations. The definitions 
for lines and vertices given in the previous section remain valid in this more complex case as 
well. It's enough to replace the basic diagrams derived for the cubic local non-linearity: 




The diagrammatic equations for the output entropy and for the equivocation become: 



H(V)~H (V) + -J2Y,9iir. 

ikl ruga 



J + 



(5? 
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(H(V\x)) x ~ {H Q (y\x)) a 



+ 2^ 



ikl mga 



+ 



Following the rules for the contraction of the wiggly and solid lines it's easy to derive the 
final expression for the mutual information: 

/ = Jo + 3 E E AhtiLM + bI] m \A m - [B + bl}- 1 ^} (59) 

ikl Qm 

We list some specific cases arising from this generic nonlinearity and the correspondent 
final expression for the mutual information: 



case 1 



9lim = gohidiiSmu (60) 

leading to the case already analyzed: J] Mm 9ki r J l khih m = goh%. The expression for the mutual 
information is given by eq.(f45l). 



case 2 



9kim = 9o$mihi -> E 9li m hkhih m = #(A(E h l) 

Mm k 



I = I + g [TrATrD + 2TrAD] 
where D = [I + bA' 1 ]- 1 - [I + bB' 1 }- 1 . 



(61) 
(62) 



case 3 



9klm — 90 — >■ E 9klmhkhlh m — #o(E 



kirn 



J = /o + 3^o(E^)(E^™) 

fcZ mi 



(63) 
(64) 



case 4 



?Mm - 90$ki5u — > E 9kimhkhihm - g hf E 

Mm m 
I = I + 90 J2( A H E Am + 2Dn J2 A im) 



(65) 
(66) 
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case 5 

9kim = 9o$ki -> Y.9kim h khih m = g (J2 h k)(J2 h m) (67) 



klm 



I = I + g (tvA D mi + £([AD] mi + [DA) mi )\ 

\ im im / 



(68) 



case 6 

dlim = QoSkiSim -> 9li m hkhih m = g 53 fr* (69) 



/ = / + 3^o5i;^fc*^«- (70) 



8 Final remarks 



In the present paper we have developed a perturbative approach for the calculation of the 
mutual information in the case of a generic non-linear channel by means of Feynman dia- 
grams. As far as we know, this is the first attempt to use this techniques in the context of 
the mutual information. 

Our analysis is valid in the case of small non-linearity compared to the output noise and 
possibly for any flat sigmoidal transfer function of a noisy channel. 

We show systematically how the consecutive steps to calculate the mutual information 
can be easily performed introducing proper diagrammatic rules, in analogy to other standard 
perturbative approaches [[|. 

We investigate more in detail the case of local non-linear transfer functions, when the 
output of each unit depends only on its local field. Previous works have shown that this 
regime provides an optimal information transfer [|12||. Then we apply the same techniques 



to the more general case of non-local non-linearities, restricted to cubic powers of h, where 
the output of each unit depends on the total structure of the connectivities. This regime 
corresponds to the case where the total output of the network is constrained in such a way 
that the state of each output unit can be modified by any pair interaction. 

Further developments of this analysis include the maximization of the mutual information 
with respect to the coupling matrix J in order to find the optimal structure of the connec- 
tivities. This should hopefully provide more interesting results, compared to the linear case, 
0, and it will be the object of future investigations ||. 
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